Centroids: Gold standards with distributional variations

نویسندگان

Ian Lewin

Şenay Kafkas

Dietrich Rebholz-Schuhmann

چکیده

Motivation: Gold Standards for named entities are, ironically, not standard themselves. Some specify the “one perfect annotation”. Others specify “perfectly good alternatives”. The concept of Silver standard is relatively new. The objective is consensus rather than perfection. How should the two concepts be best represented and related? Approach: We examine several Biomedical Gold Standards and motivate a new representational format, centroids, which simply and effectively represents name distributions. We define an algorithm for finding centroids, given a set of alternative input annotations and we test the outputs quantitatively and qualitatively. We also define a metric of relatively acceptability on top of the centroid standard. Results: Precision, recall and F-scores of over 0.99 are achieved for the simple sanity check of giving the algorithm Gold Standard inputs. Qualitative analysis of the differences very often reveals errors and incompleteness in the original Gold Standard. Given automatically generated annotations, the centroids effectively represent the range of those contributions and the quality of the centroid annotations is highly competitive with the best of the contributors. Conclusion: Centroids cleanly represent alternative name variations for Silver and Gold Standards. A centroid Silver Standard is derived just like a Gold Standard, only from imperfect inputs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A corpus-based evaluation method for Distributional Semantic Models

Evaluation methods for Distributional Semantic Models typically rely on behaviorally derived gold standards. These methods are difficult to deploy in languages with scarce linguistic/behavioral resources. We introduce a corpus-based measure that evaluates the stability of the lexical semantic similarity space using a pseudo-synonym same-different detection task and no external resources. We sho...

متن کامل

A corpus-based evaluation method for Distributional Semantic Models

متن کامل

A Distributional Approach to Evaluating Ontology Learning Methods Using a Gold Standard

This paper presents a method for the evaluation of learned ontologies against gold standards. The proposed method transforms the ontology concepts to a vector space representation to avoid the common string matching of concepts at the lexical layer. We propose a set of evaluation measures that exploit the concepts’ representations and calculate the similarity of the two hierarchies. Experiments...

متن کامل

SimLex-999: Evaluating Semantic Models with (Genuine) Similarity Estimation

We present SimLex-999, a gold standard resource for evaluating distributional semantic models that improves on existing resources in several important ways. First, in contrast to gold standards such as WordSim-353 and MEN, it explicitly quantifies similarity rather than association or relatedness so that pairs of entities that are associated but not actually similar (Freud, psychology) have a l...

متن کامل

Expanding a dictionary of marker words for uncertainty and negation using distributional semantics

Approaches to determining the factuality of diagnoses and findings in clinical text tend to rely on dictionaries of marker words for uncertainty and negation. Here, a method for semi-automatically expanding a dictionary of marker words using distributional semantics is presented and evaluated. It is shown that ranking candidates for inclusion according to their proximity to cluster centroids of...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Centroids: Gold standards with distributional variations

نویسندگان

چکیده

منابع مشابه

A corpus-based evaluation method for Distributional Semantic Models

A corpus-based evaluation method for Distributional Semantic Models

A Distributional Approach to Evaluating Ontology Learning Methods Using a Gold Standard

SimLex-999: Evaluating Semantic Models with (Genuine) Similarity Estimation

Expanding a dictionary of marker words for uncertainty and negation using distributional semantics

عنوان ژورنال:

اشتراک گذاری